Genetics of Evolution

Preliminaries

If you are not already familiar with the structure of these exercises, read the Introduction first.

Note

Reminder: Save your work regularly.

Important

If you are using a Mac, we recommend that you use either Chrome or Firefox to complete these exercises. Some of the default settings in Safari prevent these exercises from running.

Contact information

If you have questions about these exercises, please contact Dr. Kevin Middleton (middletonk@missouri.edu) or drop by Tucker 224.

Learning objectives

The learning objectives for this exercise are:

  • Describe and identify the mechanisms by which variation arises and is fixed (or lost) in a population over time.
  • Model how random mating yields predicted genotype frequencies in Hardy-Weinberg Equilibrium (HWE), and how non-random mating affects allele and genotype frequencies.
  • Test whether HWE is present in a population.
  • Explain how the processes of drift, natural selection, migration, and mutation can affect the elimination, maintenance or increase in frequency of alleles in a population.

Mechanisms of biological evolution

Evolution is defined by the change in allele frequencies over time. In this context time refers to subsequent generations in a reproducing population. Two main scales of evolution are:

  1. Microevolution: both adaptive and non-adaptive (neutral) changes within populations across generations
  2. Macroevolution: higher level changes involving origination and diversification of species.

Separating microevolution from macroevolution like this might make it appear that they are completely distinct processes. In reality:

  • A continuum exists between the two: microevolutionary change can lead to observable macroevolutionary patterns.
  • The fundamental process of change in allele frequencies over time operates the same in both.

One final point to remember is that populations don’t evolve in isolation. Species that live in communities with one another interact with both native and introduced species at many different trophic levels (Figure 1).

Image of a fly, two birds, and a louse.
Figure 1: In the Galápagos islands, the medium ground finch (Geospiza fortis, upper right) and Galápagos mockingbird (Mimus parvulus, lower left) are both under threat from an introduced nest fly (Philornis downsi, upper left). Simultaneously, these species have unique parasites, a feather mite and feather louse (lower right) that both respond to and impact the evoltionary history of these species. Image from University of Utah

Hardy-Weinberg Equilibrium

If evolution is the change in allele frequencies over time, what defines the lack of change in allele frequencies? In population genetics, no change in allele frequencies is called Hardy-Weinberg Equilibrium (often abbreviated HWE for simplicity). The equations for HWE were developed during the first decade of the 1900’s, shortly after the re-discovery of Mendelian genetics.

HWE allows us to predict genotype (and thus phenotype) frequencies under a specific set of conditions in which there are no additional forces, either internal or external, acting on a population:

  1. Infinite population size
  2. All mating is random
  3. No migration
  4. No selection
  5. No mutation

There are many processes that can lead to deviations from Hardy-Weinberg Equilibrium.

For each of the conditions above, (1) give an example of a process that would lead to a deviation from HWE and (2) predict whether that process would lead to increased or decreased genetic variation in the subsequent generation.

Evolutionary biologists are often interested in determining if a population is in Hardy-Weinberg Equilibrium. If a population is found to be violating HWE, then it suggests that one of the processes listed above is happening in that population.

Evolution of single genes

The first phenotypes that you learned about as well as those described in the first set of exercises (Transmission of Genetic Information) were Mendelian traits. In Mendelian traits, a single gene is responsible for a single trait. In this context, you also learned about dominant and recessive alleles (and their variations), which lead to different observable phenotypes.

The simplest case to use for exploring Hardy-Weinberg Equilibrium is a single allele in a diploid organism. In this case, there are only three possible genotypes for two alleles (p and q)1:

  • pp
  • pq
  • qq

The HWE equation results from the basic rules of probability that you learned about in the first set of exercises. To explore HWE in a population, we will use the example of the Peppered moth (Biston betularia; Figure 2)2.

(a) Dark morph
(b) Light morph
Figure 2: The Peppered moth (Biston betularia) exhibits two different color morphs, dark and light. These phenotypes are controlled by a single dominant allele p. (a) Individuals with either pp or pq genotypes have the dark morph. (b) Light morphs have the qq genotype.

Imagine a population of Peppered moths with the following allele frequencies:

  • p = 0.1
  • q = 0.9

There are a few things to note here:

  • The frequencies summarize information about an entire population, not about any particular individuals.
  • The frequencies sum to 1: either p or q (just like a flipped coin can be either heads or tails).
  • The frequencies don’t tell us about whether one allele is dominant.

Probabilities of allele combinations

We can use the rules of probability that you have learned about to determine the probability of an individual having each of the possible genotypes: pp, qq, or pq. Because allele assort independently, the probabilities are just the products of the probabilities.

\[pp = 0.1 \times 0.1 = 0.01\]

\[qq = 0.9 \times 0.9 = 0.81\]

\[pq = (0.1 \times 0.9) + (0.1 \times 0.9) = 0.18\]

Because pq is not distinguishable from qp, we add the probabilities of each combination (0.1 x 0.9 = 0.09 and 0.9 x 0.1 = 0.09).

Either way we add up these probabilities, the sum is 1:

Thus, basic probability leads to the expectation for a population in HWE:

\[p^2 + 2pq + q^2 = 1\]

A population in Hardy-Weinberg Equilibrium will satisfy this equation.

Counts of genotypes

We start with a population of 1,000 Peppered moth individuals that is in HWE. As above, the probability of p is 0.1 and of q is 1 - 0.1 = 0.9.

We expect the following genotypes in the population:

Notice that we calculate the probability of q as 1 - p, so we only have to change the value of p. With these starting parameters, there are 10 pp, 180 pq, and 810 qq individuals.

If each of the individuals makes 10 gametes, then pp individuals will contribute 2 p alleles, pq will contribute 1 p allele and 1 q allele, and qq individuals will contribute 2 q alleles to the gene pool.

We will have the following numbers of alleles represented and use those to calculate the resulting frequencies of p and q in the next generation.

In the first code block above, iteratively change the values of pop_size and p. Start by leaving p at 0.1 and change pop_size to larger or smaller values. Run the first code block and then the second. See how the frequencies of p and q change. Then set pop_size to 1000 and change the value for p to some number between 0 and 1. Again run the first code block and then the second.

Testing for HWE in a population

A population in Hardy-Weinberg Equilibrium will satisfy the equation:

\[p^2 + 2pq + q^2 = 1\]

How can we use this information to statistically test whether a population satisfies the assumptions of HWE?

In the first set of exercises (Transmission of Genetic Information and Case study: Investigating a newly discovered muscle mutation in mice), you learned about using the chi-squared test to determine if predicted counts of births per day and counts of mice from a test-cross showing the small muscle phenotype matched the theoretical predictions.

Because Hardy-Weinberg Equilibrium allows us to make predictions about the frequencies of alleles in a population, we can compare observed counts of alleles to the predicted counts of alleles.

If the population is in HWE, then the test will not be significant. Remember that the chi-squared test is really testing for deviations from the predictions. So the test will be significant (P < 0.05) if the counts do not match what we expect.

Imagine a population with 500 diploid individuals. These 500 individuals will have 1000 total alleles. You observe the following genotypes among the 500 individuals:

  • pp: 5
  • pq: 95
  • qq: 400

We first calculate the counts for the p and q alleles from the counts of individuals. Each pp individual contributes 2 p and each pq individual contributes 1.

From the counts we can calculate the frequencies:

p = 0.105 and q = 0.895 are the observed frequencies in the population.

We use these two values to calculated the expected frequencies for pp, pq, and qq, assuming the population is in HWE.

The expected frequencies for pp, pq, and qq are converted into expected counts for 500 individuals:

You will notice that we have fractional individuals: we expect 5.5125 pp individuals. While this would not happen in real life, for our calculations, it is not a problem.

Recall that the equation for a chi-squared test is:

\[\chi^2 = \sum_{i=1}^n \frac{(Observed_i - Expected_i)^2}{Expected_i}\]

We have all the information we need to carry out the chi-squared test. All we have to do is organize the data into a format that we can use for the test.

We will make a data structure that holds the observed and expected counts:

And then performing the chi-squared test is just a matter of calling the function chisq.test():

The Observed column is the observed counts of each genotype. The Expected column is the expected counts of each genotype if the population is in HWE. rescale.p = TRUE tells the function that the supplied Expected values are in counts and not in probabilities.

Case Study: Testing for HWE in Small Muscle Phenotype mice

Let’s apply what we have learned to the data for mice that have the small muscle phenotype from earlier exercise: Case study: Investigating a newly discovered muscle mutation in mice.

In three of the lines, it appeared that the frequency of the small muscle phenotype was increasing in two of the lines (3 and 6; Figure 3). Although they all started at the same low level, the red lines appear to be increasing rapidly starting about generation 7.

Figure 3: Frequency of mice exhibiting the small muscle phenotype in three lines aross the first 22 generations of selection. Data from Garland et al. (2002)

We can use what we have learned to test whether it is likely that these breeding lines deviate from HWE. If so, it would suggest that the allele for the small-muscle phenotype was being selected for in the Selected lines (red lines in Figure 3).

To fill in the equation for the chi-squared test, we need to know the observed and expected counts for genotypes. We can use the frequencies in the table to determine the counts for each. We will only consider two generations: 14 and 22 for this test.

If these lines are in HWE, then the frequencies of the mini-muscle allele will not change between these generations.

In this experiment, each generation has about 200 individuals (the population size). Because the “mini-muscle” allele is a autosomal recessive, we know that any individuals with the phenotype are homozygous (pp).

Here are the counts of animals with the small-muscle phenotype at generations 14 and 22.

  • Generation 14: 10
  • Generation 22: 19

We can use this information to calculate the frequency of pp at generation 14. With 10 affected (pp) animals out of a population size of 200, there are 20 p alleles out of 400 total. We count up all the p alleles (\(p^2\)) and then take the square-root to get \(p\).

The allele frequencies are p = 0.224 (“mini-muscle” allele) and q = 0.776 (“wild type”).

Using these frequencies we can determine the observed counts of animals (out of 200) with each of the genotype combinations.

What is your prediction

Line 6:

  • Generation 14: 10
  • Generation 22: 136

Exploring the assumptions of HWE

Mechanism of selection

- Moving population average
- Drift vs. selection
  • Effect of sample size on selection and drift in phenotypic evolution
    • Drift as sampling error
    • Drift has a larger effect in small populations

Populations have significant underlying variation

Intuition of moving mean

Feedback FIXME

We would appreciate your anonymous feedback on this exercise. If you choose to, please fill out this optional 4-question survey to help us improve.

References

Garland, T., Jr, M. T. Morgan, J. G. Swallow, J. S. Rhodes, I. Girard, J. G. Belter, and P. A. Carter. 2002. Evolution of a Small-Muscle Polymorphism in Lines of House Mice Selected for High Activity Levels. Evolution 56:1267–1275. Wiley.

Footnotes

  1. p and q are most commonly used as the allele names, but you could substitute any pair: A and B, A1 and A2, etc.↩︎

  2. Thanks to Dr. Elizabeth King for this example.↩︎